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ABSTRACT >. \ ■ * 

Six anomalies in achievement test scores encountered 
by the* AxXstin Independent ^School District are described. These 
include crossing gaps with un interpolated medians; total group median 
declines while all subgroups* medians rise; outlying total 
percentiles; percentile and grade equivalent growth antithesis; same 
grade equivalent earning a different percentile, in each content are*; 
and the median, does not represent any group. Evaluators and . 
^researchers must . know How tjb distinguish real achievement/ gains from 

^artif actual gains r^sultirig from anomalies such as those discussed in 
this paper. It is necessary to determine when an inconsistency is an, 
error and when it i,^ an explainable, anomaly. When interpreting 
achievement test scoijfes, interaction of types of scopes such as 
percentiles and grade equiWlents^ shifts in student demographics, 
and non-nominal distributions within groups being tested need to be 
carefully considered*. Jher factors causing the anomalies and possible 

• solutions are discussed. (DWH) # * 
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Maybe we missed a few classes in our graduate^ statistics courses, \r 
possibly our prpblems are not technical enough to. merit journal arti- 
cles. At any rate, we were temporarily stunned wheo unexpected 
anomalies and mystifying inconsistencies began to haunt 'our. 
. ' reporting of achievement test scores. * We also had been collecting a 
| ■ list of questions which often cpnfused^ teachers and other school staff. 
* '/ This paper pulls these anomalies and questions 'together to serve as a 
9 * reference foil anyone who reports achievement test results. 

* Can .each ethnic, group gain more in a year than does » < * 

the tDtal groutf combined? * \ • 

. Can e.ich ethnic group gain while the total group f s 
medial l* declines? • a . 

. Can a (group/ s percentile median on each t0 subtest be- 

higher tfiarx the g-roup's "median on the" total score? ^ 




'l : 



% ^ Can a student gain a year in grade equivalents and 

lose percentile points? 
• I • 

* . Why is fche same grade equivalent equal to two dif- 

ferent percentile ranks in reading and math?! * 

, Can the gap between two ethnic* groups' achievement 
* • decrease at each indi^dual grade level from one 
. year* to the next but continue to widen from one * 
grade to "the next? 

• Can a school's median percentile misrepresent the 
•actual student population? " % 

This paper^pproaches these issues from a practitioner's perspective. 
As evaluatqrs and researchers wha report achievement test scores, we 
need to understand $rtien an inconsistency is an 6rror and when" i£ is 
merely an explainable, anomaly. ' i - ■; 
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/ _ ANOMALY 1: CROSSING* GAPS WITH. UNINTERPOLATED MEDIANS 
r ; ^ (Changing Subtly by Leaps and Bounds) • 

. ». 'h - • * • * • 

When two or' more subgroups or comuonents change in* a positive or 'negative direction 
from one specific ; point. in time to another pb'ijit in time, we expect the total ; 
.group, to reflect this Change and be the^sjum total of the changes, or at least 
as large as the smallesf^subgroup Change* For ^Sample (see .Figtite 1) if these, 
.three gfcoups change by five? ^points, ve might expect the. total group to change 
hy five points . * - v 

In the* case of AKD districtwide scmiewment results, this pattern does not/.; 
foliow, as we found out in the 1$80-81 school, year. That was the second year 
in whicl^ we used the Iow£ Tests of Basic* Skills (ITBS^ , so we' were anxious to 
see how our achievement Chan^d from the initial" yea* of the ITBS use, 1979-8Q. 
The achievement results tfere particularly important bacause^980-8l was the^ \ • 
first^year of large-scale, * court-ordered busing *for 'desegregation. 
\ ■ ' $ • * i ki 

In AISD the junipr*high students are tested in February , so they vere the; « 
first "test case^ f for us of the IT9S norms ovfer time* 

Our initial analysis of the gracje 7 Reading Total (RT) results by ethnicity 
^loojced excellent: in. terms of percentile gains (see Figure- 2) . Our District /• 

RT medjten percentile score for Blacks rose hjk 7 %ile points, as did the Dis- 
trict kT median percentile score for HisparCtds. We looked at the RT score - 

for'our Anglo/Other students and* found th^t it had risen by A %ile points;. 

By this time we tfere ecstatic. We looted, eagerly for 'the .District score 

for all students tested - arfid found only ^,2 #ile point increase. Disappoint- 
I ment "*set -in*. How could such -a small overall increase result from such large 

subgroup increaa.es? , • . : i 

As believers in fully checking out our numbers \ we ran a frequency distribu- f 
*^tion *of the scores, to verify that the^middle score was\n fact -at the median 
score,, that we had calculated. It was* The ^problem now became explaining' 
these, results ito our School Board, and the Austin .pub lie ♦ What "happened to 

<^use this anomaly in the "scores? • * 1 v * 

' ' x * 1 < ? 

% V 

Firsts for a given* test, all 99 percentile ranks may not be achievable*. The ■ 
te gaps between achievable percentiles vairy in 'size* at different joints in the 1 
di£tribu£i/on« Typically in .the middle percentile ranges , not all percentiles . 
are possible;* while at the extremes, each percentile rank Is possible'* -In% ; - 
'the. case 6i our grade' 7 R£ scbres, a small change .in raw score moved ;each ethnic ^ 
group's uninterpolated median across a gap which was larger than the *gap spanned 
by the diange in the total* group median. The net. result was that otjf RT gains 
by ethnicity were impressive, usirig^ fnedian percentiles, while o\xt gain as a 
"bistrict was not as impressive.* o - 

^ >- . ' y f • ' " % . ; ; ' 

As noted in Figure '3* the gaixk in terms of gr^de fequivalent pWLpts^for RT was 
smaller for/ t^e Anglo stuSents than f6r the Total group, but, the percentile 
,g$in was 'four points compared to two points. This change o'f four /percentile 
ptfints was/the smallest positive^ change- possible at that point on jfche RT per*- 
ceritile scale, while at the middle of the scale a. positive change was , ^ . 
limited to two pertetjgile* points. ; . t h 




Secondly, because the medians for each yeaif and each group were all independently 
calculated, this possibility of large increases in subgroup scores and smaller 
increases in the total group can always: ^exist . Theste ^independent median calcu- 
lations are not direct functions of each other. The subgroup medians do not have* 
direct influence on th^ total group median score. Therefore the expected re- 
lationships, as seen in the first figure, do not hold and should not be expected. 

Our response to the problems encountered In this anomaly of large increaie6 in, 
sii& group scopes but small increases in the "total group scores is twofold. First, 
we*are investigating the. use qf calculating an interpolated median percentile 
score* As we found -out' over the past few year;s, a shift in the scores of a few 
students by a single point can eremite a large difference in the ^median percentile 
point when based upon the actual middle-scoring student. * If, this „ shift is near 
a large gap in the percentile tables., the resulting median score may not provide/ 
the most accurate picture of, districtwide achievement. An interpolated median 
percentile will^allow for a score which\ althdugh not truly attainable, will . 
morfe accurately reflect the tl middle" of the score distribution.' We feel v this 
will eliminate random 'increases/decf eases in districtwide averages, which may, 
not be actual changes in achievemei^t but rather artifacts of the method used 1 
fro calculate the median percentile. The use of interpolated median percentile 
' points should more accurately assess "true" ch&jiges in achievement over time. 

We also plan to* give, more emphasis tCKlTBS grade equivalent scores , ^which were 
% developed as an equal-interval scale. Through thfcjuse of grade, .equivalent 
scores we hpjfe to have a be££er representation of the size of changes in achieve- 
ment for groups in various; ranges of the distribution. 
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Figure 1. Common-sense relationship ^between, subgroup and total group - 
percentile, scores over time. _-•». 
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, Figur* 3/ A ^comparison of madian percentile and grade equivalent scores by 
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ANOMALY 2: TOTAL GROUP T S MEDIAN DECLINES WHILE ALL SUBGROUPS 1 MEDIANS RISE 
(How Three Positives Mdfce a Negative) 

f 

Anomaly number two was discovered in the results of* our April 1981 ITBS elementary 
school testing. We encountered a case where* tjie total median percentile and grade 
equivalent scoreJfractually dropped, even though the ethnic subgroup median per- 
centile and mean grade equivalent .s^pres ros^. Again, on thB surface, it seems 
like that is not possible. We rechecked the data, carefully multiplying the^ ^ ^ 
mean grade equivalent score for each ethnic group by the total number of stu-: ' 
dents in that group. The resufts were verified # -J three positives did in fact 
make. a negative. Hotf? ' , 

As seen in Figure 4, there was a shift in the school system 1 s population by 
ethnicity from 1980 to 198i. There was now a lower overall proportion*of 
Anglo students *in the District. This 'higher achieving glroup exerted less up- 
ward influence on* the 1981, District total score. Even though every ethnic 
group's mean jfraae equivalent score *rose, the total*was„ influenced less, by 
the highest achieving* group. * 

A second factor entering into the' picture vas a change in the percentage of 
*gtudents taking the test in 1980 and in 1981, by ethnicity*. An incre^e in^ 
the percentage of Black and Hispanic students tested in 1981 over 1980 raised 
the proportion of lower achieving minority , students represented in the district- 
wide mean grade equivalent score. . • ' 

With this second^ anomaly, the* explanations, of the test results are logicll, 
and even obvious when one concentrates on the phenomena iavolved. But if one 
looks only at the numbers, the results alone ^ the achievement picture is 
puzzling. % . , . 4 M * * e * $ 

* j * * « ' * i — 

Our response to this anomaly^,, a decrease in total group score while tl\e 
subgtoups ^increased, focused on estimating the,imp&£t of shifts in ethnicity 
and "the number of students tested. We^alculalteH an estimate of the 1981 grade 
equivalent scores, based uppn the 1980* scores. Achievement was held constant, 
but we took into account the change in the ; number of students tested by. ethni- 
city • r These estimated 1981 grade, equivalent scores were>.compared to the 
actual J.981 scpres to determine the] expecped change in achievement which could , 
be attributed- to this shift ix^-^Xbe&lt composition ^and number of students tested. 

through the use of, these projected -pcofes, AISD score3*Mih reading would be* 
"expected 1 to be lower in 1981 in grades 1-7 and higher in grade -8. A comparison 



of tihese projected scorespwith actual 1981 achievement indicated that: 

\ x . . ' : * 4 * • • " 

• 'ichidvement improved rather than declined in grades 1> 2, 
• , and S-7. ' 

» achievement in grades 3 and 4 declined some, butiao more" 
' ^ than* expected. % t v* . I • - 

„ . • achievement* in grade 8 impr6ved^aaare than expected.. * 0 

wfe^ajre tttnl also repotting longitudinal data for^students who have tieen tested 
evfery year, thus making- 8 our .year-to-year comparisons 'on ^he same students 
than merely on groups .whose make-up might shift. 
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Figure 4. Comparison of changes in mean grade equivalent scores from 1980 
'to'l981 . V. . ITBS, Reading Total, Grade 3, AISD, * 
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i ANOMALY 3: THE OUTLYING TOTAL PERCENTILES 

(The Junior High Principal Panic) 



We encountered this issue while interpreting median and 'quartile scores for a 
group of junidr high principals. Although our computer programs hadi been checked 
a dozen times> the principals noticed that their totals were often nbticably 
lower or higher than their subtests. Controlled panic began • Of course, every- 
thing wa& "calculated correctly, an4 another anomaly 'was added to but list.. 



f * f < 

The psychometrically naive educator expects tptal scores to be^somehow arith- 
metically \ function . of subtest >rcores. Unfortunately, the farther away from 
the .50th jpercentile vthat scores fall, the more likely that ,the total -percentile 
will be fiarther atfay)fron; 50 than are all the s.ubtest -percentiles. Figure 5 
presents examples to illustrate this anomaly.^ ^ ' ' 

,When all subtest percentiles are, consistently low (or high), the percentile fdr 
the totaj. test will usually be even lower (or 1 higher) rather than being about % 
midway among the subtest percentiles. *The explanation for £his lies in the-, 
nature of^the score distributions. An individual student may score ^ery low on 
one subtest but somewhat higher on the others. A pattern of very low scores on 
all subtests is less common and results in a total score which falls even lower 
in the distribution (i.e., receives a lower percentile rank) . 

The outlying tottal percentile occurs frequently s with individuals 1 scores. » v 
However, group averages are even more* prone to this phenomenon. Jor a group, 
the average subtest scores tend to be more similar than* are the subtest scores 
for individuals. When first- and third-qu^rtile points are reported, the out- 
flying total percentile is qui A common. «* 
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Figure ,5 , .'Comparison; of percentile ranks associated w£th the same gracte 

• equivalent for* low, average, and hi |h achievers — iTBS, lan^uagfe , 
/ tests^- grade fr 9 spring nprms. * 



ANOMALY 4: THE PERCENTILE AND GRADE EQUIVALENT 'GROWTH ANTITHESIS * 
(a. What Goe,s Up Can'Al^o'Go Down, at the Same Time.) 
I (b. The J^irrieder I G<*, the More Behinder I Get.) 

Conclusions about a student 1 s gtowth in achievement may be [antithetical depending 
upon the .choice ,of grade equivalents or percentiles as the statistic to use in 
expressing gains, A student must' maintain or improve a percentile "rank, for 
achievement to be considered as progressing well. . Bi grjade equivalents, a stu- 
dent must gain 1.0 ijx a calendar year. to'. have demonstrated a year's normal growth 
Unfortunately, neither represents a complete picture of achievement growth, and - 
either alone may be misinterpreted. ' 

a. ^ Consider a student who scores at the 27th percentile in grade 3 and at 
the 28th percentile in grade 4 (Language Total, -Iowa Tests a£ Basic 
Skills, 1978, spring norms). * / ~ 

Did this student make better than avdfrage progress? * , 
r Qid thi£ student mkke more^ than a year's growth? 
• N Ts this student fcloser to being "on grade, leVel" in grade 4?' - 

The simple answer tp each of these three questions is "No, 11 Even though 
the student's percentile rank improved, the growth in grade equivalents 
.was ooly from 3.0 to 3.9. 

* * 

Consider another student who scored at the 5.1 grade equivalent* level in 
grade 3 and at the 6.2 level in grade 4» (same test) 4 " * 

Did this student's percentile rank also improve? 
. Did this student make th$ gain that is expected of students 
this far* above gradq level?' . * • • 

♦ 0 The answer to these. two questions is "No." Even though more than one year's 
growth was achieved, this student's percentile rank, was' 78 in grade 3 and . 
77 in -grade 4. j *\ - i 

* «_ * 0 """"" ' > 1 

* What is also interesting is that the achievement gap between this high 

achiever and this low achiever increased by 0.2 grade equivalent while g 

* their percentile rank g^p closed by two points. ' / , • t 

a _ * 

■ * 

To generalize, a student may §ain more than. 1.0 grade equivalent and 
still' realize a decline in percentile rank.- On the other hand, a 
student may gain less than l t 0 grade equivalent and realize a rise in^ v 
percentile rank. Obviously, the two* scales are not linked in a direct 
manner* Students who* score below the.. 'first quaftile do not havejfeo gain 
1*0 "gradfe equivalent in a year to maintain * their ranking .relat^Pto 
othe^. low achievers; however, students whq score gbove the thircTquartile 
must gain more than '1.0 grade equivalent to maintain their ranking 
among *the high achievers ♦ v . ' 



4. ~e 



^Percentiles ape Important in interpreting gains because they provide 
the basis ior answering-the question "Did the student's ranking change? 11 
Grade- equivalents, on the other hand, do not answer^ this question. They 

? answer the question "How muchjtiid the student learn?? 11 The** grade equi- 
valent §cale answers t;his question 'id units : roughly equivalent to one 

.''year's growth for* an average (50th percentile) student.^ 1 * 

With this distinction, between the two scales .being clear, the apparent 
.„ antithesis in ; growth is mo.re easily understood. High-; average*-, *an£ 

low-achieving' students .may «m?intain" their various rankings in the 
, population while making' different grade equivalent gains* Only a£ w • 

the 50tfr percentile lev#l wpuld a gain of l.Q grade equivalent' be 

ne^ssary and sufficient to maintain the same percentile ranje. 

. Figjjre^ 6' presents an exampl^-of this* issue/asihg the % Iowa Tests of.. 

* Basic 'Skills , 197B£" Language ,^otal hopae^Yor the springs grades 3-6. 
A 25th percentile third-grade student who maintains that •yanking- . ' 
across three years of instruction will gain 2.42 jj^ade equivalents / 
compared tq a gfdn of 3.35 for a 75th percentile student; These two / 
Students will haVe maintained their relative rankings; however, the* 
gap^bfefe^een them will have increased by over nine months in .three 
ye&rs. To have y prevented this gap from widening, # the* 25th percentile 
student would havd^rieeded an increase from the 25th to the 41st percen- 
tile adtoss these three years. *Fox these two. students, equal gains 
in grade equivalents would have resulted in a 16 percentile point 
greater gain for the lower achiever. * * ' 
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Figurfe 6. Grade equivalent* gains made by low, average, and high achievers 
# .who* "maintain the same percentile ranfo across three years — ITBS, 
. . Language'Total, spring .norms . ,*% 
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For Individuaf^ttfdertts , the. Importance of inspecting both percentiles 
and grade equivalents when interpreting achievement growth is obvious £ 
from the preceeding^exainpi^s* the same importance' is present wh§n j * • 
considering measures of central tendency for groups • A frequent analysis 
for public* school systems Ip a comparison of -.the. gap between minority anti- 
majority, studentr groups', achievement levels (from one 'school year to 
p another. When one group's median scores are'above the 50th percentile 
and the other's are. Below'^ tttere -is* real potential for' simplistic c'on- * 
elusions which may be . misleading * . ' ' a . ■ * # * 



After a coupl? of' years of reporting that the percentile* gig between 
our minority students,) and our majority -students had been riarrowing,' 
we .decided to project when th§ g^p would be closed if current attends* 
coritinued. What we found was that the, -gap' would, not ever c^qse.-. In 
sh9rt , we e f pund that the minorities welre gaining less from yftar to 
year* in terms of grade equivalents than were the majorities. The 
higher gains 'in percentiles ware an ^artifact vf their relative lo- 
cation in the distribution of scores, ' * 
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ANOMALY'S: 'THE SAME GRADE EQUIVALENT EARNS A DIFFERENT 
'PERCENTILE IN EACH CONTENT AREA. 
<Slx of One is Larger than Half a Dozen of, - 
/ the' Other) u 



So many educators who are surprised when they find that' a 
grade equivalent of 8.2 is^'the 90th percentile* in language 
w but -the 96th percentile in math are the same people who ' 
-say "of course"' when someone states that children vary* 
more in their language skills^than in their^i&ath skills; 
People- can get quite* frustrated, fibwe^er, to find that '« 
they cannot straightforwardly compare/ grade equivalents • 
aeross content areas. ■* # v 

- % ' " 

Figure 7 presents the percentiles associated with certain 
grade equivalents for the ITBS .Language Total and Math 
Total, grade 5,_ spring -norms S Notice that only at the 
50th percentile are the two percentile scales matched up. 
By* definition, they have to match at the 50th percentile. 
However, iince math skills do vary less across ^tudents 
' than-'do language* skills, the math percentiles change' more 
.slowly as one goes either higher or lower from the 50th 
percentile. «* » 

♦ 

A student who is at the 90th percentile on both tests 
receives a gracie equivalent bf 8.2 on Language Tota;L and . 
a 7.6 on Math Total. This student is farther ahead of 
the same proportion of peers in both, ai^as, but is farther 
above gfade level in math. • 




Figure 7. Comparison of grade f equivalent and percentile 
?core8~ITBS, Language Total and Hath Total, 
grade 5> 'Spring norms. 



ANOMALY 6: THE MEDIAN DOES NOT REPRESENT ANY GROUP - ^ 
> / (Nobody Wants to be ^Considered Average.) 

«• . -\ - ' 

\ " 

Achievement test scopes . have* a mystique. School personnel may feel uncomfortable 
talking about them'Vfecause they do not understand the terms used, such as per>- 
centile,. gra^e equivalentSy standard' error, or normal curve equivalent. Even 
in districts where intensive efforts have been made to educate the personnel to - 
a few terms, anxiety and misunderstanding still abound. Our office has*attempted 
to Bnstite that all AISD school personnel understand the basic statistic used in 
reporting our achievement results— the median percentile score. We have not 
been truly successful* * ^ , - » 

A partial explanation for t;his inability to understand this "simple" concept 
may be that it is not, as discovered during examination of our 1981 test results, 
a simple concept in practice.' There are # timeg wfyen the median percentile score 
for a school does not really represent any single group of students (grouping 
along traditional lines,, like ethnicity). In this- situation the score may seem 
incorrect and meaningless, and school personnel may indeed lose confidence in 
the utility of the score. * 

Figure 8 % provides a case in which' the median ptrcentile score for ali>students 
tested does not really represent any one group of students by ethnicity. In 
reading, the Anglo/Other median was 31 points higher than the total group median, 
with the medians for Black' and Hispanic stents J|8 and 20 points v lowe r than 
the total group median percentile score* 'The tliird-quartile scores for the 
Black and Hispanic students, are lower than the total group median score, and. 
the first-quartile score for the Anglo/Other group is equal to the total group 
median percentile score. Thus we have two contrasting groups in terms of 
achievement, the Anglo/Other and the Black/Hispahic. The total group median 
percentile is a score which really does not represent any group in the. school. 

Ethnicity aside, this is a' school which has many i high achievers add many low 
achievers 'aad fewer average ones — definitely bimidal. A single school median * 
masks this. Whenever, possible, subgroup medians jneed to be examined prior, to 
using a total group median to describe a school. j . , 

I . 

* * . : * 

Total Blapk Hispanic Anglo/Other 
Number Tested \* '428 7 66 132 230 

■ ♦ 

Third Quartile 63 %ile 21 Zile 24 %ile 77 %ile 
- \ . Median * 30 Zile 12 %ile * 10 %ile 61 %ile ^ ' ■ 

First Qr»artile. ■ 10 %ile 5 %ile 4, %ile . 30 %ile 

I ' • / i 

Figure 8. 1981 STEP II Percentile Scores, .(trade 9, heading. 
(Actual School Data) •? 



_ J* .1 < 
.Conclusion ' 



In* addition Ijdb curriculum/ test content matches, reliability estimates, 
and* a dozen other, issues whidh confound the straightforward interpret 
tatipn of achievement test scores, the interaction of types of scores^ 
such v as percentiles and grade' equivalents, shifts in student demograph- 
ies, *and non-normal distributions within groups being tested must be 
carefully considered in interpreting achievement test results, fhis^ 
paper describes six ^oma,lies which have befen recently encountered by 
.our f school System; % As long afc we evaiuatdrs and researchers are called 
uponjto interpret test results, we must be* able to distinguish real 
adUevement: §ains from artjLfactual gains resulting from anomalies such 
3s these- dis'cussed here. * 

J ■ ■ ' •' . 



/ 





